br 1
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France (0.04)
Asynchronous Gathering of Opaque Robots with Mobility Faults
Pramanick, Subhajit, Jana, Saswata, Mandal, Partha Sarathi, Sharma, Gokarna
We consider the fundamental benchmarking problem of gathering in an $(N,f)$-fault system consisting of $N$ robots, of which at most $f$ might fail at any execution, under asynchrony. Two seminal results established impossibility of a solution in the oblivious robot (OBLOT) model in a $(2,0)$-fault system under semi-synchrony and in a $(3,1)$-Byzantine fault system under asynchrony. Recently, a breakthrough result circumvented the first impossibility result by giving a deterministic algorithm in a $(2,0)$-fault system under asynchrony in the luminous robot (LUMI) model using 2-colored lights. However, a breakthrough result established impossibility of gathering in a $(2,1)$-crash system in the LUMI model under semi-synchrony. In this paper, we consider a {\em mobility fault} model in which a robot crash only impacts it mobility but not the operation of the light. We establish four results under asynchrony in LUMI with the mobility fault model. We show that it is impossible to solve gathering in a $(2,1)$-mobility fault system using 2-colored lights, and then give a solution using 3-colored lights, which is optimal w.r.t. the number of colors. We then consider an $(N,f)$-mobility fault system, $f
- Asia > India > Assam > Guwahati (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
- (2 more...)
- Research Report > New Finding (0.74)
- Research Report > Promising Solution (0.48)
A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking
As large language models (LLMs) are increasingly deployed in critical applications, the challenge of jailbreaking, where adversaries manipulate the models to bypass safety mechanisms, has become a significant concern. This paper presents a dynamic Stackelberg game framework to model the interactions between attackers and defenders in the context of LLM jailbreaking. The framework treats the prompt-response dynamics as a sequential extensive-form game, where the defender, as the leader, commits to a strategy while anticipating the attacker's optimal responses. We propose a novel agentic AI solution, the "Purple Agent," which integrates adversarial exploration and defensive strategies using Rapidly-exploring Random Trees (RRT). The Purple Agent actively simulates potential attack trajectories and intervenes proactively to prevent harmful outputs. This approach offers a principled method for analyzing adversarial dynamics and provides a foundation for mitigating the risk of jailbreaking.
Is Bellman Equation Enough for Learning Control?
You, Haoxiang, Molu, Lekan, Abraham, Ian
The Bellman equation and its continuous-time counterpart, the Hamilton-Jacobi-Bellman (HJB) equation, serve as necessary conditions for optimality in reinforcement learning and optimal control. While the value function is known to be the unique solution to the Bellman equation in tabular settings, we demonstrate that this uniqueness fails to hold in continuous state spaces. Specifically, for linear dynamical systems, we prove the Bellman equation admits at least $\binom{2n}{n}$ solutions, where $n$ is the state dimension. Crucially, only one of these solutions yields both an optimal policy and a stable closed-loop system. We then demonstrate a common failure mode in value-based methods: convergence to unstable solutions due to the exponential imbalance between admissible and inadmissible solutions. Finally, we introduce a positive-definite neural architecture that guarantees convergence to the stable solution by construction to address this issue.
- North America > United States (0.28)
- Europe > France (0.14)
Leveraging Noisy Observations in Zero-Sum Games
Athanasakos, Emmanouil M, Perlaza, Samir M
This paper studies an instance of zero-sum games in which one player (the leader) commits to its opponent (the follower) to choose its actions by sampling a given probability measure (strategy). The actions of the leader are observed by the follower as the output of an arbitrary channel. In response to that, the follower chooses its action based on its current information, that is, the leader's commitment and the corresponding noisy observation of its action. Within this context, the equilibrium of the game with noisy action observability is shown to always exist and the necessary conditions for its uniqueness are identified. Interestingly, the noisy observations have important impact on the cardinality of the follower's set of best responses. Under particular conditions, such a set of best responses is proved to be a singleton almost surely. The proposed model captures any channel noise with a density with respect to the Lebesgue measure. As an example, the case in which the channel is described by a Gaussian probability measure is investigated.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- (8 more...)
Smoothed Online Learning is as Easy as Statistical Learning
Block, Adam, Dagan, Yuval, Golowich, Noah, Rakhlin, Alexander
Much of modern learning theory has been split between two regimes: the classical \emph{offline} setting, where data arrive independently, and the \emph{online} setting, where data arrive adversarially. While the former model is often both computationally and statistically tractable, the latter requires no distributional assumptions. In an attempt to achieve the best of both worlds, previous work proposed the smooth online setting where each sample is drawn from an adversarially chosen distribution, which is smooth, i.e., it has a bounded density with respect to a fixed dominating measure. We provide tight bounds on the minimax regret of learning a nonparametric function class, with nearly optimal dependence on both the horizon and smoothness parameters. Furthermore, we provide the first oracle-efficient, no-regret algorithms in this setting. In particular, we propose an oracle-efficient improper algorithm whose regret achieves optimal dependence on the horizon and a proper algorithm requiring only a single oracle call per round whose regret has the optimal horizon dependence in the classification setting and is sublinear in general. Both algorithms have exponentially worse dependence on the smoothness parameter of the adversary than the minimax rate. We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning. Finally, we apply our results to the contextual bandit setting to show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits in the case that contexts arrive in a smooth manner.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
Wang, Weichen, Han, Jiequn, Yang, Zhuoran, Wang, Zhaoran
Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment. As the number of agents grow to be very large, the system can be approximated by a mean-field problem. Therefore, it has motivated new research directions for mean-field control (MFC) and mean-field game (MFG). In this paper, we study the policy gradient method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most of the recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on the continuous-time models where some analyzing techniques can be interesting to the readers. For both MFC and MFG, we provide policy gradient update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.
- North America > United States > Kansas > Cowley County (0.06)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)